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Abstract 

We discuss recent developments in the modeling of negative autoregulated genetic 
networks. In particular, we consider the temporal evolution of the population of 
mRNA and proteins in simple networks using rate equations. In the limit of low 
copy numbers fluctuation effects become significant and more adequate modeling is 
then achieved using the master equation formalism. The analogy between regulatory 
gene networks and chemical reaction networks on dust grains in the interstellar 
medium is discussed. The analysis and simulation of complex reaction networks are 
also considered. 
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1 Introduction 



Recent advances in molecular biology techniques for the engineering of syn- 
thetic networks have made possible the measurement of populations of mRNA's 
and proteins in simple genetic networks. Measurements of the average protein 
content of cells an d their time depen dence enabled to quantify the behavior of 
genetic networks ([Kalir et al. .1120011 ). These measurements have been modeled 
using rate equations, mainly under quasi steady state co nditions. However, real 
biolo gical systems are likely be away from steady state ( Smith. 19681 Murray. 
1989). Furthermore, many components of cells appear in low copy numbers 



and are therefore subjected to large fluctuations. Recently, such fluctuations 
at the level of a single ce ll were measured experimentally using the green 



2002, Paulsson 



fluore scent protein (GFP) ijElowitz et al. .1120021 ISwain et al 
2004). Measurements of protein levels in single cells revealed distributions 



that depend on the topology of the regulatory network controlling the partic- 
ular protein. For exa mple, it was shown that neg ative autoregulated networks 
reduce fluctuations ( Becskei and Serrano. 2000f ). The modeling of these fluc- 
tuations cannot be done using rate equations and requires the master equation 
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formalism dMcAdams and Arkin] 1997 , 19991 Paulsson and Ehrenberg. 2000, 
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In this paper we consider the modeling of negative autoregulated genetic net- 
works in cell populations and in single cells. We focus on the simplest network 
in which a single protein serves as a repressor for the production of its own 
mRNA. Such networ k may serve as a module or "network motif" in complex 



regulatory networks ( Milo et al.. 2002L 2004 ). We describe the time depen- 



dence of the system using rate equations. In commonly used models it is as- 
sumed that the population of the bound repressor proteins is in quasi steady 
state. We consider the dynamics of the network when this assumption does not 
hold. We show that in such cases the commonly used models underestimate 
the response of the system to variations in the external conditions. In such 
cases one should take into account the bound repressors as a separate popula- 
tion. In the limit of low copy numbers of the mRNA's and proteins stochastic 
noise becomes significant. We show that in this limit the rate equations should 
be replaced by a master equation. The rate and master equations used in the 
analysis of genetic networks are closely related to those that describe chem- 
ical reaction networks on small grains. In this context, the limit of low copy 
number is achieved for reaction networks on interstellar dust grains, due to 
the sub-micron size of the grains and the extremely low flux due to the low 
density of the interstellar gas. This analogy is discussed and results obtained 
for grain chemistry, which may also be useful for genetic network analysis, are 
presented. 

The paper is organized as follows: in Sec. 2 we consider the dynamic behavior 
of a simple genetic network in a cell population using rate equations. In Sec. 3 
we consider the limit in which each cell contains a small population of proteins, 
where the stochastic features become significant. The master equation for this 
system is presented. The analogy between genetic networks and grain-surface 
chemistry is discussed in Sec. 4. A summary is presented in Sec. 5. 



2 Rate equations 



In genetic autoregulatory circuits the production rate of a certain product 
protein A depends on its population size, [A] (given by the average number 
of such proteins in a cell). In negative autoregulation, increasing the popula- 
tion size [A] d ecreases the rate of production. This m echanism is commonly 
approximated ( Rosenfeld et al.. 2002L Paulsson. 20021 ) by the Hill function 



9(A) 



k[A] 



2 



where g(A) is the production rate of A proteins, g max is the maximal pro- 
duction (achieved in conditions where [A] = 0) and k is an affinity constant. 
This approximation is in agreemen t with experiments done at steady state 
( Yagil and YagilJll971 . Yagil. 1975| ). Here we consider the following circuit: a 
population size [R] of mRNA's is produced with a maximal rate g R and de- 
grades at rate d R . This mRNA produces a protein A which acts as a repressor 
and controls the production rate of the mRNA. The production rate of A is 
thus proportional to [R] and its degradation rate is d A . The intracell dynamics 
is described by the rate equations 
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where the dots represent time derivatives, namely [R] = d[R]/dt. These equa- 
tions have two steady state solutions, however, only one of them is relevant 
because the other exhibits negative population sizes. The relevant solution is 
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and the convergence to this solution is fast ( Rosenfeld et al.. 2002h . However, 
these equations do not take into account explicitly the chemical mechanism 
which enables the regulation. In this mechanism, one of the A proteins bounds 
to the repression site on the DNA and inhibits the mRNA production. This 
protein should be subtracted from the population of free proteins in the cell, 
which Eq. (2) does not do. In addition, the constant k in the Hill function 
captures only the steady state repression rate and not its dynamical behavior. 

The dynamics of the repression mechanism can be incorporated into the rate 
equation by taking the bound protein as a third component in the reaction 
network. This gives rise to three dynamic equations: 



[R]=g R (l-[r])-d R [R] 

[A] =g A [R] - d A [A] - a [A}(l - [r]) + e*i[r] (4) 
[r]=a [A}(l-[r])- ai [r] 
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where [r] represents the average population of bound repressors in a cell. Since 
there is only a single repression site in each cell, [r] is limited to the range 
< [r] < 1. In fact, it represents the fraction of time in which the repressor 
site on the DNA is occupied by a bound repressor. The average productivity 
of the DNA in producing mRNA's is proportional to 1 — [r]. The binding 
coefficient ao is the rate in which a free protein becomes bound. This rate 
should be multiplied by the number of free proteins and by the average number 
of unoccupied repression sites per cell, 1 — [r]. The desorption coefficient a± 
is the rate in which a bound protein leaves the repression site. The reduced 
rate equation set given by Eq. (2) is an approximation to the extended set of 
Eq. (4) in the following manner: when «o and ol\ are large compared to other 
rate constants, [r] approaches steady state much faster than [A] and [R]. In 
this case, it is justified to assume that [r] is in quasi steady state and impose 
[r] = 0. This gives the steady state solution 

_ a [A} 

[l ~a 1 + a [AY [b) 



Substituting this solution into Eq. (4) gives the reduced set of Eq. (2), with 
k = a /«i- This implies that Eq. (3) is the steady state solution of Eq. (4) 
as well. This solution is stable and there are no oscillations for any values 
of the parameters. However, the time dependent solutions of Eq. (2) and of 
Eq. (4) are not the same. Whereas Eq. (2) assumes rapid convergence of [r] 
into its steady state, Eq. (4) holds also in case that the relaxation time is 
long. In Figs. 1 and 2 we compare the dynamics described by the two sets of 
equations. The rate constants are = 0.05, qa = 0.06, d R = 0.02, d& = 0.02, 
a = 0.001 and a± = 0.001 (all in units of s" 1 ). These rates represent typical 
transcription and translation times, which are of the order of 10 to 20 seconds. 
Typical half-life times of proteins a nd mRNA's vary in the range of several 
minutes ( Elowitz and Leibler. 2000f ). All these time scales are much shorter 



than the cycle time, which is typically around 30 minutes. 

The dynamical behavior of [A] turns out to be different in the two sets of 
equations. The deviations from steady state are much larger in the extended 
set of equations. The dynamics is also highly dependent on the initial condition 
of [r] which is an additional degree of freedom that does not exist in the 
reduced set. In Fig. 1 where the initial condition is [r] = 0, the extended set 
shows an over-shoot in A production, while in Fig. 2 where the initial condition 
is [r] = 1, it shows an under-shoot in A production. 

In some cases the regulation of the production of a protein A is mediated by 
a more complex molecule. For example, the repressor may be a molecule D 
which is a dimer of A molecules produced by the reaction A + A — > D. The 
standard way of modeling such a circuit is to modify the repression term (the 
Hill function) in Eq. (2) to 
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Fig. 1. Intracell dynamics as calculated by Eq. (2) (solid line) and by Eq. (4) 
(dashed line). The average amount of bound proteins [r] is also shown (dotted line). 
The initial conditions are [A] = 3 and [r] = 0. 

[A] = g A [R]-d A [A]. (6) 



For this system the extended set includes equations for [R] and [A] , as well as 
for the dimer (repressor) population [D] and for the bound repressor [r]. The 
equations take the form: 



[R]=g R (l-[r])-d R [R] (7a) 

[A]= g A [R]-d A [A]-2a 2 [A] 2 (7b) 
[D] = a 2 [Af - d D [D] - a [D](l - [r]) + c^r] (7c) 
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Fig. 2. Intracell dynamics as calculated by Eq. (2) (solid line) and by Eq. (4) 
(dashed line). The average amount of bounded proteins [r] is also shown (dotted 
line). The initial conditions are [A] = 3 and [r] = 1. 



[r]=a [D](l-[r])- ai [r] 



(7d) 



As in the ordinary case in which the repressor is the protein A itself, the 
inhibition term 1 — [r] in Eq. (7b) is equal to the Hill function of Eq. (6) in the 
limit of rapid relaxation of [r]. In this case k = olqoi2 / (ol\(Ld) ■ However, when 
the repressor is the dimer D, there is an additional term in Eq. (7c) which has 
no analogue in Eq. (6). This term gives rise to a difference in the results of 
the reduced and the extended sets even in the steady state solution, as shown 
in Fig 3. The steady state solution of the extended set is stable and exhibits 
no oscillations. The parameters used in Fig 3 are the same as in Figs. 1 and 2, 
and the additional parameters are the degradation rate of dimers, do — 0.02 
(s _1 ), and the production rate coefficient of dimers, a2 = 0.01 (s -1 ). The 
latter coefficient is determined by the diffusion rate of proteins in the cell. A 
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Fig. 3. The populations of proteins A as obtained from the reduced set (solid line) 
and from the extended set (dashed line) and of dimers (repressor) D (dashed-dotted 
line) and bound repressors r (dotted line) as obtained from the extended set, as a 
function of time. The initial conditions are [A] = 3, [D] = 1 and [r] = 1. 



related quantity, namely, t he time it takes for a protein to diffuse across the 
cell was recently measured ( Elowitz et al.. 19991 ) and found to be of the order 



of one second. The inverse of this time can be used as an upper bound for the 
production rate coefficient ai- 

The reduced set of equations does not take into account explicitly the dimer 
population, which is responsible for the repression. Both Eqs. (6) and (7) 
do not take into account the fact that one needs at least two A proteins 
simultaneously in the cell in order to produce a dimer. Therefore, when the 
population of A proteins goes down to order 1 both equations fail and the 
master equation formalism is required. 
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3 The Master Equation 



Rate equations are used to describe the dynamics of the average number of en- 
tities (such as proteins) in large populations such as those handled in in vitro 
experiments. In these equations it is assumed that the densities of substances 
are continuous variables that behave in a deterministic fashion. This approach 
is not suitable for genetic regulatory ne tworks when the populations of the rel- 



evant species in a single cell are small ( Gil espie 



1977LlKoll99lLll99ilMc"Adams and ArkinJl99flL 



1977, Nicolis and Prigpgine 



SzallasiJll999llGibson and Miolsnes 



2001). In this case one should take into account the discrete nature of the 
populations and the fact that for small populations the fluctuations become 
significant. In negative regulatory systems there is a population of free repres- 
sors in the cell. In addition, there is a single repression site on the DNA where 
a single repressor molecule may bound. Therefore, each repression site can be 
either occupied by a repressor molecule (where r = 1) or vacant (r = 0). Thus, 
r cannot take any intermediate values. In such cases fluctuations may have 
an important impact on the processes involved and their dynamics should be 
described in more detail. 



One of the approaches suggested is the use of stochastic si mulations which 
take into account the dynamics of all participating substances (lGillespie.lll97'i 



Gibson and Bruck. 



Mc Adams and Arkin.lll997l iMorton-Firth and Brav.11199 
2000j). The difficulty with these simulations is that they are based on the ac- 
cumulation of large amounts of statistical data, and thus require extensive 
computer simulations. Thus, this approach is not always feasible in the case 
of complex networks which involve a large number of proteins. A comple- 
mentary approach is based o n dir e ct integration of the the master equation 
( McAdams and Arkin. 1997 , 19991 Paulsson and Ehrenberg. 2000l Paulsson 



2000L iKeoler and ElstonJ l200ll IPaulssonJ 120021 l2004h . This approach takes 
into account the probability distribution of all possible states of the system, 
and not only the average values as in the rate equation approach. It captures 
the time evolution of the probabilities of all the microscopic states of the 
system. 



We now apply the master equation approach to study the negative autoregu- 
latory circuit of Eq. 4. We denote the number of copies of the free protein A 
by ha and of the mRNA by n R . The number of proteins A which are bound to 
the repression site on the DNA is given by n r . For a single repression site n r 
can only take the values or 1. The master equation follows the time evolution 
of the probability distribution P(n R ,n A ,n r ). It takes the form 



P{n R , n A , n r = 1) = gAn R [P{n R , n A -1,1)- P(n R , n A , 1)] 
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+ d R [(n R + l)P(n R + 1, n A , 1) - n R P(n R , n A , 1)] 
+ d A [(n A + l)P(n R , n A + 1,1)- n A P(n R , n A , 1)] 
+ a o ((n A + l)P(n R ,n A + l,0) 

- aiP(n R , n A , 1) (8a) 
P(n R , ua, n r = 0) = g A n R [P(n R , n A - 1, 0) - P(n R , n^, 0)] 

+ d R [(n R + l)P(n R + l,n A ,0) - n R P(n R , n A , 0)] 
+ d A [(n A + l)P(n i? , + 1,0)- n A P(n R , n A , 0)] 

- a n A P(n R , n A , 0) 
+ aiP(n R ,n A - 1, 1) 

+ Sfil^K - 1, n A , 0) - P(n fl , n A , 0)], (8b) 



where the two cases of n r = and n r = 1 are presented separately. The 
first terms in the equations describe the formation of a new protein. The sec- 
ond and third terms describe the degradation of the mRNA and the protein, 
respectively, while the fourth and fifth terms describe the binding and un- 
binding of a protein to the repression site on the DNA. Eq. (8b) also includes 
a term that corresponds to the formation of a new mRNA (not possible in 
the repressed case). These equations can be integrated numerically in order 
to obtain the time dependence of the probability distribution. It can also be 
solved for steady state by taking P(n R ,n A ,n r ) = 0. 

The master equation provides all the moments of the distribution P(n R , n A , n r ) 
and their time dependence. For example, the average population of proteins 
A is given by 

„max „max 

n R n A i 

( u a) = Yl n A P(n R , n A , n r ) (9) 

njj=0 tia=0 n r =0 



where ng ax and n™ ax are the cutoff values that provide upper bounds on the 
populations of mRNA molecules and A proteins in the cell, respectively. The 
repression site can be either occupied (n r = 1) or unoccupied (n r = 0). 

Solving the master equation under steady state conditions for systems with 
different rate constants we calculated the appropriate averages, and compared 
the results with the rate equations. 

In Fig. 4 the average levels of free proteins, mRNA molecules and bound 
protein (repressor) in the cell (at steady state), are shown vs. a , as obtained 
from the master equation (solid line) and the rate equations (dashed line). 
The rate equations turn out to overestimate the average level of proteins and 
mRNA molecules, by a factor of 2-4 for systems with low copy number of 
proteins. On the other hand, when the average number of proteins in the cell 
is large, the results of the rate equations and master equation coincide. 
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Fig. 4. The steady state populations of free proteins, mRNA's and bound proteins 
(repressor) vs. the rate constant ao, calculated using the master equation (solid 
line) and the rate equations (dashed line). 

Mathematically the discrepency between the results of the rate equations and 
the master equations is due to non-linear terms such as the term that describe 
the attachment rate of proteins to the repression site. In the rate equation, this 
term is given by ao[A] (1 — [r]), namely as a product of averages (first moments). 
In the master equation it is given by the second moment ao(nA(l — tir)). The 
formation of dimers is also described by a nonlinear term. In the rate equations 
this term is given by a^f^] 2 , namely it depends only on the first moment. In 
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the master equation it is given by a2{n\) — a^n^), thus it depends on both 
the first and second moments. 



The simple networks studied here can be considered as modules or motifs in 
complex genetic networks. However, the simulation of complex networks using 
the master equation is difficult. This is due to the proliferation in the number 
of equations as the number of components (mRNA's and proteins) increases. 
Consider, for example, a network that involves three protein species, A, B and 
C. The master equation is written in terms of the probabilities P(ua, Ub, n>c) 
of having a certain population of proteins. The population size of each protein 
is limited by an upper cutoff. For example, the population of protein A takes 
the values ha = 0, 1, . . . , n™ ax . Clearly, the number of equations increases 
exponentially with the number of species, making this approach infeasible for 
complex networks. 

However, typically these networks are sparse, namely most pairs of proteins 
do not interact with each other. This feature makes it possible to divide the 
master equation into several sets of equations, each set including only a small 
number of protein species. For example, if proteins B and C do not interact, 
the master equation described above can be broken into two sets that involve 
-Pab(^5^_b) and P\c( n A, n c)- m the case of large and sparse networks this 
dramatically reduces the number of equations and thus enables the simulation 
of complex networks using the master equation. This technique, named the 
multi-plane method, was recently pro posed in the context of chem ical reaction 
networks on interstellar dust grains (|Lipshtat and Biham. 2004 l. The math- 



ematical structure of these networks is similar to that of genetic networks. 
Thus, the multi-plane method is perfectly applicable for the simulations of 
complex genetic networks. The similarity between the two systems is briefly 
discussed below. 



4 Discussion: Genetic Networks and Grain-Surface Chemistry 



Processes which exhibit a similar mathematical structure to the genetic net- 
work dynamics appear in the context of chemical reaction networks on in- 
terstellar dust grains. The chemistry of interstellar clouds consists of reac- 
tions taking place in the gas p hase as well as on the surfaces of dust grains 



( Hartquist and Williams.1119 95). It turns out that the most abundant molecule 
in the Universe, namely molecular hydrogen does no t form in the gas phase but 
on dust grain surfaces 



Gou 



d and Salpeter. 19631 Hollenbach and Salpeter 



1971L iHollenbach et al.Jll971| ). These grains are made of amorphous silicate 
and carbon compounds and are of sub-micron size. In addition to the forma- 
tion of molecular hydrogen, these grains support complex reaction networks 
that produce a variety of molecules that consist of hydrogen, oxygen, car- 
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Table 1 



Analogy between the processes of surface chemistry and of gene regulation. 
Description Surface chemistry Gene regulation 



system 


dust grain 


cell 


break-up mechanism 


grain fragmentation 


cell division 


mobility 


surface diffusion 


diffusion in cell 


addition — > A 


flux F 


transcription gji, production g^ 


removal A — > 


desorption W 


degradation dn,d,A 


typical reaction 


A + B -> C + D 


A^ A + B 


feedback regulation 


rejection: F(l — 6) 


repression: gn{l — [r]) 



bon and nitrogen. Here we discuss the similarity between the mathematical 
descriptions surface reaction networks and genetic networks. In particular, 
we suggest that computational methodologies developed in the context of in- 
terstellar grain chemistry are likely to be useful for the analysis of genetic 
networks. 



Consider a dust grain exposed to a flux of atomic and molecular species such 
as H, O, OH and CO. Atoms and molecules that hit and stick to the grain 
hop as random walkers between adsorption sites on its surface. When two 
atoms/molecules encounter one another they may react and form a more 
complex molecule. The rate equations that describe the reaction networks 
on grains include flux terms, desorption terms and reaction terms. The flux 
terms represent the flow of atoms and molecules from the gas phase onto 
the surface. The desorption rates are proportional to the population sizes of 
atoms and molecules on the grains, while the reaction terms are proportional 
to the products of the population sizes of the reactive species. In general, the 
rate equations resemble those that describe genetic networks. The analogy 
between the two systems is summarized in Table 1. In both systems reactive 
species are added, diffuse, react and removed. The system itself may break 
up (cell division or grain fragmentation), dividing the population of reactive 
species into two sub-populations. Both systems exhibit some kind of nega- 
tive feedback. In genetic networks this is provided by the repression circuit, 
in which the rate of attachment of proteins to the repression site is given by 
cko[A](1 — [r]). Certain surface reaction systems exhibit the Langmuir rejec- 
tion behavior, in which atoms from the gas phase that hit the surface in the 
vicinity of an already adsorbed atoms are rejected. The flux term F is then 
modified to the form F(l — 9), where 9 is the coverage, namely the fraction of 
adsorption sites on the surface that are occupied by adsorbed atoms. In the 
context of grain-surface chemistry, low copy numbers are obtained in the limit 
of small grains under conditions of low flux. In this limit the master equa- 
tion is required ( Biham et ah. 200 ll Green et ah. 2001 , Biham and Lipshtat.1 
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2002t ). For complex reaction networks of multiple species, the master equation 



becomes infeasible due to the proliferation in the number of equations. In this 
case, the multi-plan e method is used in order to keep the number of equations 
at a tractable level ((Lipshtat and Biham.ll2004 ). 



5 Summary 



We have considered the rate equation and master equation approaches to the 
modeling of genetic networks. In particular, we have studied the temporal 
evolution of the population of mRNA and proteins in simple negative au- 
toregulated genetic networks. As long as the populations of all the reactive 
components of the network are not too small, rate equations provide a good 
quantitative description of the network dynamics. However, once the popula- 
tions of the mRNA or proteins are reduced to order 1 or less, rate equations 
are no longer suitable and the master equation is needed. This is due to the 
fact that the rate equations involve only average quantities, while the mas- 
ter equation takes into account the discrete nature of the populations as well 
as the fluctuations. The simple networks studied here can be considered as 
modules or motifs in complex genetic networks. The simulation of complex 
networks using the master equation is difficult, because the number of equa- 
tions quickly proliferates. The multi-plane methodology, recently developed 
in the context of grain-surface chemistry, that tackles this problem is briefly 
described. Finally, the analogy between genetic networks and grain-surface 
chemistry is discussed. 

We thank J. Paulsson for illuminating discussions. 
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